WHAT IS CLAIMED IS : 

1. A method for memory failure recovery, comprising: 

maintaining a predetermined number of duplicate and primary processes; 
keeping the processes in synchronization; 

managing the processes so that a single process image is presented to an 
external environment; 

detecting a computer system exception which affects one of the processes; and 
terminating the affected process. 

2. The method of claim 1 wherein the detecting element includes detecting a 
memory failure. 

3. The method of claim 1 further comprising: 

allocating a new memory space to each of the duplicate processes, which is 
separate from a memory space allocated to the primary process. 

4. The method of claim 1 wherein the maintaining element includes: 
identifying a primary process; 

monitoring a fault-tolerance value corresponding to the primary process; and 
setting a number of duplicate processes equal to the fault-tolerance value, 

5. The method of claim 4 wherein the monitoring element includes assigning a 
predetermined fault-tolerance value to a primary process. 
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6. The method of claim 4 wherein the monitoring element includes dynamically 
modifying the fault-tolerance value of the primary process, in response to a computer 
conmiand. 



|:4 

P 

o 
ill 



1 7. The method of claim 4 wherein the setting element includes adding a new 

2 duplicate processes, if the number of duplicate processes is less than the fault- 

3 tolerance value. 

1 8. The method of claim 4 wherein the setting element includes deleting a 

2 duplicate process, if the number of duplicate processes is more than the fault- 

3 tolerance value. 



13 1 9. The method of claim 1 wherein the keeping element includes synchronizing 
2 the processes upon receipt of data from an external environment. 



1 10. The method of claim 1 wherein the keeping element includes synchronizing 

2 the processes upon receipt of signals from an external environment. 

1 11. The method of claim 1 wherein the keeping element includes synchronizing 

2 the processes upon transmission by one of the processes to an external environment. 

1 12. The method of claim 1 wherein the managing element includes permitting 

2 only one of the processes to transmit to an external environment. 

1 13. The method of claim 1 wherein the managing element includes permitting 

2 only one of the processes to perform a system call to an external environment. 
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14. The method of claim 1 wherein the managing element includes permitting 
only one of the processes to perform a library call to an external environment. 



1 15. A method for memory failure recovery, comprising: 

2 maintaining a predetermined number of duplicate and primary processes; 

3 keeping the processes in synchronization; 

4 managing the processes so that a single process image is presented to an 

5 external environment; 

6 detecting a computer system exception which affects one of the processes; and 
CI 7 terminating the affected process; 

8 wherein the maintaining element includes, 

9 identifying a primary process; 

%^ 10 monitoring a fault-tolerance value corresponding to the primary 

5 J: 11 process; and 

if"' 

5 12 setting a number of duplicate processes equal to the fault-tolerance 

13 value; and 

14 wherein the managing element includes, 

15 permitting only one of the processes to perform a system call to an 

16 external environment. 
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1 16. A data structure for memory failure recovery within a computer system, 

2 comprising the fields of: 

3 a primary process field, for identifying primary processes within the computer 

4 system; and 

5 a fault-tolerance variable field, for identifying a predetermined number of 

6 duplicate processes, corresponding to the primary processes, to be maintained within 

7 the computer system. 



Q 1 17. A computer-usable medium embodying computer program code for 
III 2 commanding a computer to perform memory failure recovery comprising: 
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3 maintaining a predetermined number of duplicate and primary processes; 

4 keeping the processes in synchronization; 

5 managing the processes so that a single process image is presented to an 

6 external environment; 

7 detecting a computer system exception which affects one of the processes; and 

8 terminating the affected process, 

1 18. The medium of claim 17 wherein the detecting element includes detecting a 

2 memory failure. 

1 19. The medium of claim 17 further comprising: 

2 allocating a new memory space to each of the duplicate processes, which is 

3 separate from a memory space allocated to the primary process. 
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1 20. The medium of claim 17 wherein the maintaining element includes: 

2 identifying a primary process; 

3 monitoring a fault-tolerance value corresponding to the primary process; and 

4 setting a number of duplicate processes equal to the fault-tolerance value, 
5 

1 21. The medium of claim 1 wherein the managing element includes permitting 

2 only one of the processes to transmit to an external environment. 



1 22. A system for memory failure recovery, comprising: 

2 means for maintaining a predetermined number of duplicate and primary 

3 processes; 

J'5 li 4 means for keeping the processes in synchronization; 

5 means for managing the processes so that a single process image is presented 



M, 6 to an external environment; 

ry 

7 means for detecting a computer system exception which affects one of the 



ill 8 processes; and 

9 means for terminating the affected process. 
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23. A system for memory failure recovery, comprising: 

a primary process memory space hosting a primary process; 

a duplicate process memory space hosting a duplicate process corresponding 
to the primary process; 

a synchronization buffer for keeping the duplicate process in synchronization 
with the primary process; 

a processor for generating an exception signal in response to detection of a 
memory failure condition which affects the primary process; and 

an operating system for receiving the exception signal, terminating the 
affected primary process, and maintaining a predetermined number of primary and 
duplicate processes. 

24. The system of claim 23, further comprising: 

a buffer controller for permitting the processes to receive communications 
from an external environment while permitting only one of the processes to transmit 
the external environment. 

25. The system of claim 23, wherein the exception signal is a machine check abort 
signal. 
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